Stumping e-rater: challenging the validity of automated essay scoring

نویسندگان

  • Donald E. Powers
  • Jill Burstein
  • Martin Chodorow
  • Mary E. Fowles
  • Karen Kukich
چکیده

This report presents the findings of a research project funded by and carried out under the auspices of the Graduate Record Examinations Board Researchers are encouraged to express freely their professional judgment. Therefore, points of view or opinions stated in Graduate Record Examinations Board Reports do not necessarily represent official Graduate Record Examinations Board position or policy. The Graduate Record Examinations Board and Educational Testing Service are dedicated to the principle of equal opportunity, and their programs, services, and employment policies are guided by that principle. Abstract For this study, various writing experts were invited to "challenge" e-rater-an automated essay scorer that relies on natural language processing techniques-by composing essays in response to Graduate Record Examinations (GRE ®) Writing Assessment prompts with the intention of undermining its scoring capability. Specifically, using detailed information about e-rater's approach to essay scoring, writers tried to "trick" the computer-based system into assigning scores that were higher or lower than deserved. E-rater's automated scores on these " problem essays " were compared with scores given by two trained, human readers, and the difference between the scores constituted the standard for judging the extent to which e-rater was fooled. Challengers were differentially successful in writing problematic essays. Expert writers were more successful in tricking e-rater into assigning scores that were too high than in duping e-rater into awarding scores that were too low. The study provides information on ways in which e-rater, and perhaps other automated essay scoring systems, may fail to provide accurate evaluations, if used as the sole method of scoring in high-stakes assessments. The results suggest possible avenues for improving automated scoring methods.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Automated Essay Scoring With e-rater® V.2

E-rater® has been used by the Educational Testing Service for automated essay scoring since 1999. This paper describes a new version of e-rater (V.2) that is different from other automated essay scoring systems in several important respects. The main innovations of e-rater V.2 are a small, intuitive, and meaningful set of features used for scoring; a single scoring model and standards can be us...

متن کامل

Automated Essay Scoring With E-rater v.2.0

E-rater has been used by the Educational Testing Service for automated essay scoring since 1999. This paper describes a new version of e-rater that differs from the previous one (V.1.3) with regard to the feature set and model building approach. The paper describes the new version, compares the new and previous versions in terms of performance, and presents evidence on the validity and reliabil...

متن کامل

Automated Essay Scoring With e-rater® V.2

E-rater® has been used by the Educational Testing Service for automated essay scoring since 1999. This paper describes a new version of e-rater (V.2) that is different from other automated essay scoring systems in several important respects. The main innovations of e-rater V.2 are a small, intuitive, and meaningful set of features used for scoring; a single scoring model and standards can be us...

متن کامل

Construct Validity of e-rater® in Scoring TOEFL® Essays

This study examined the construct validity of the e-rater automated essay scoring engine as an alternative to human scoring in the context of TOEFL essay writing. Analyses were based on a sample of students who repeated the TOEFL within a short time period. Two e-rater scores were investigated in this study, the first based on optimally predicting the human essay score and the second based on e...

متن کامل

Enriching Automated Essay Scoring Using Discourse Marking

Electronic Essay Rater (e-rater) is a prototype automated essay scoring system built at Educational Testing Service (ETS) that uses discourse marking, in addition to syntactic information and topical content vector analyses to automatically assign essay scores. This paper gives a general description ore-rater as a whole, but its emphasis is on the importance of discourse marking and argument pa...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:
  • Computers in Human Behavior

دوره 18  شماره 

صفحات  -

تاریخ انتشار 2002